Overview

Dataset statistics

Number of variables15
Number of observations8679298
Missing cells22238246
Missing cells (%)17.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory993.3 MiB
Average record size in memory120.0 B

Variable types

Numeric4
Categorical7
Unsupported4

Alerts

EntryDate has a high cardinality: 2911 distinct valuesHigh cardinality
EntryTime has a high cardinality: 27227 distinct valuesHigh cardinality
Code has a high cardinality: 2025 distinct valuesHigh cardinality
Analyte has a high cardinality: 3479 distinct valuesHigh cardinality
ValueNumber has a high cardinality: 23084 distinct valuesHigh cardinality
ValueText has a high cardinality: 32847 distinct valuesHigh cardinality
Unit has a high cardinality: 195 distinct valuesHigh cardinality
Report is highly overall correlated with IDHigh correlation
ID is highly overall correlated with ReportHigh correlation
Code has 8106112 (93.4%) missing valuesMissing
NCLP has 573524 (6.6%) missing valuesMissing
ValueNumber has 1338881 (15.4%) missing valuesMissing
ValueText has 7340434 (84.6%) missing valuesMissing
RefHigh has 1747017 (20.1%) missing valuesMissing
RefLow has 1747253 (20.1%) missing valuesMissing
Unit has 1383102 (15.9%) missing valuesMissing
RefHigh is an unsupported type, check if it needs cleaning or further analysisUnsupported
RefLow is an unsupported type, check if it needs cleaning or further analysisUnsupported
NLCP_E is an unsupported type, check if it needs cleaning or further analysisUnsupported
NCLP_E is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2022-11-26 18:44:53.784016
Analysis finished2022-11-26 18:48:03.954673
Duration3 minutes and 10.17 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

Patient
Real number (ℝ)

Distinct14158
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean455221.28
Minimum85
Maximum1472547
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size66.2 MiB
2022-11-26T19:48:04.008667image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum85
5-th percentile8140
Q160424
median180869
Q31181123
95-th percentile1243465
Maximum1472547
Range1472462
Interquartile range (IQR)1120699

Descriptive statistics

Standard deviation495840.26
Coefficient of variation (CV)1.0892291
Kurtosis-1.238934
Mean455221.28
Median Absolute Deviation (MAD)155062
Skewness0.75973107
Sum3.9510011 × 1012
Variance2.4585757 × 1011
MonotonicityNot monotonic
2022-11-26T19:48:04.095601image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
210904 12656
 
0.1%
1203600 12262
 
0.1%
1248228 11517
 
0.1%
193098 10777
 
0.1%
210732 10225
 
0.1%
583928 9983
 
0.1%
1205878 9664
 
0.1%
212373 9280
 
0.1%
1205251 9147
 
0.1%
1236850 9122
 
0.1%
Other values (14148) 8574665
98.8%
ValueCountFrequency (%)
85 1683
< 0.1%
99 476
 
< 0.1%
104 574
 
< 0.1%
124 1154
< 0.1%
126 594
 
< 0.1%
131 496
 
< 0.1%
153 1021
< 0.1%
224 1026
< 0.1%
254 707
 
< 0.1%
287 2140
< 0.1%
ValueCountFrequency (%)
1472547 1050
< 0.1%
1340726 1863
< 0.1%
1305340 2428
< 0.1%
1298848 372
 
< 0.1%
1290559 137
 
< 0.1%
1286279 464
 
< 0.1%
1284508 822
 
< 0.1%
1283452 503
 
< 0.1%
1283279 265
 
< 0.1%
1272007 207
 
< 0.1%

Report
Real number (ℝ)

Distinct629860
Distinct (%)7.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18955521
Minimum1237515
Maximum27823222
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size66.2 MiB
2022-11-26T19:48:04.190814image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1237515
5-th percentile1533431
Q117679873
median20802937
Q324266728
95-th percentile27118895
Maximum27823222
Range26585707
Interquartile range (IQR)6586855

Descriptive statistics

Standard deviation7645952.7
Coefficient of variation (CV)0.40336283
Kurtosis0.69463298
Mean18955521
Median Absolute Deviation (MAD)3270230
Skewness-1.3245439
Sum1.6452062 × 1014
Variance5.8460592 × 1013
MonotonicityNot monotonic
2022-11-26T19:48:04.281773image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
26739266 195
 
< 0.1%
26740660 195
 
< 0.1%
26739216 195
 
< 0.1%
26739424 195
 
< 0.1%
26740099 191
 
< 0.1%
26739267 191
 
< 0.1%
26740661 191
 
< 0.1%
26739217 191
 
< 0.1%
26737746 191
 
< 0.1%
26739425 191
 
< 0.1%
Other values (629850) 8677372
> 99.9%
ValueCountFrequency (%)
1237515 2
 
< 0.1%
1237550 5
 
< 0.1%
1237586 14
< 0.1%
1237620 4
 
< 0.1%
1237625 2
 
< 0.1%
1237633 2
 
< 0.1%
1237635 9
< 0.1%
1237652 13
< 0.1%
1237653 4
 
< 0.1%
1237658 5
 
< 0.1%
ValueCountFrequency (%)
27823222 3
 
< 0.1%
27823214 9
< 0.1%
27823209 9
< 0.1%
27823190 1
 
< 0.1%
27823188 9
< 0.1%
27823174 3
 
< 0.1%
27823149 1
 
< 0.1%
27823105 19
< 0.1%
27823103 16
< 0.1%
27823101 1
 
< 0.1%

ID
Real number (ℝ)

Distinct8604099
Distinct (%)99.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69795856
Minimum1282
Maximum1.9273763 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size66.2 MiB
2022-11-26T19:48:04.399344image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1282
5-th percentile6458270.8
Q123188354
median52265658
Q31.1075503 × 108
95-th percentile1.6219078 × 108
Maximum1.9273763 × 108
Range1.9273635 × 108
Interquartile range (IQR)87566671

Descriptive statistics

Standard deviation53137093
Coefficient of variation (CV)0.7613216
Kurtosis-0.84762724
Mean69795856
Median Absolute Deviation (MAD)35994544
Skewness0.58835003
Sum6.0577903 × 1014
Variance2.8235507 × 1015
MonotonicityNot monotonic
2022-11-26T19:48:04.488814image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
27803180 2
 
< 0.1%
21865565 2
 
< 0.1%
20650535 2
 
< 0.1%
21656037 2
 
< 0.1%
21656038 2
 
< 0.1%
21656039 2
 
< 0.1%
21656040 2
 
< 0.1%
21656041 2
 
< 0.1%
20650536 2
 
< 0.1%
20650537 2
 
< 0.1%
Other values (8604089) 8679278
> 99.9%
ValueCountFrequency (%)
1282 1
< 0.1%
1491 1
< 0.1%
4796 1
< 0.1%
4797 1
< 0.1%
4798 1
< 0.1%
4799 1
< 0.1%
4800 1
< 0.1%
4801 1
< 0.1%
4802 1
< 0.1%
4803 1
< 0.1%
ValueCountFrequency (%)
192737629 1
< 0.1%
192737628 1
< 0.1%
192737474 1
< 0.1%
192737473 1
< 0.1%
192737472 1
< 0.1%
192737424 1
< 0.1%
192737423 1
< 0.1%
192737422 1
< 0.1%
192737421 1
< 0.1%
192737420 1
< 0.1%

EntryDate
Categorical

Distinct2911
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size66.2 MiB
03.10.2022
 
9468
12.09.2022
 
8834
07.09.2022
 
8647
21.11.2022
 
8515
05.09.2022
 
8333
Other values (2906)
8635501 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters86792980
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row30.09.2022
2nd row30.09.2022
3rd row16.04.2015
4th row16.04.2015
5th row16.04.2015

Common Values

ValueCountFrequency (%)
03.10.2022 9468
 
0.1%
12.09.2022 8834
 
0.1%
07.09.2022 8647
 
0.1%
21.11.2022 8515
 
0.1%
05.09.2022 8333
 
0.1%
04.10.2022 7943
 
0.1%
03.05.2022 7581
 
0.1%
10.10.2022 7565
 
0.1%
02.05.2022 7514
 
0.1%
10.01.2022 7488
 
0.1%
Other values (2901) 8597410
99.1%

Length

2022-11-26T19:48:04.568960image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
03.10.2022 9468
 
0.1%
12.09.2022 8834
 
0.1%
07.09.2022 8647
 
0.1%
21.11.2022 8515
 
0.1%
05.09.2022 8333
 
0.1%
04.10.2022 7943
 
0.1%
03.05.2022 7581
 
0.1%
10.10.2022 7565
 
0.1%
02.05.2022 7514
 
0.1%
10.01.2022 7488
 
0.1%
Other values (2901) 8597410
99.1%

Most occurring characters

ValueCountFrequency (%)
0 20536993
23.7%
2 18261521
21.0%
. 17358596
20.0%
1 14100046
16.2%
5 2774007
 
3.2%
6 2722345
 
3.1%
9 2706267
 
3.1%
7 2398301
 
2.8%
8 2326732
 
2.7%
3 2018337
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 69434384
80.0%
Other Punctuation 17358596
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 20536993
29.6%
2 18261521
26.3%
1 14100046
20.3%
5 2774007
 
4.0%
6 2722345
 
3.9%
9 2706267
 
3.9%
7 2398301
 
3.5%
8 2326732
 
3.4%
3 2018337
 
2.9%
4 1589835
 
2.3%
Other Punctuation
ValueCountFrequency (%)
. 17358596
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 86792980
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 20536993
23.7%
2 18261521
21.0%
. 17358596
20.0%
1 14100046
16.2%
5 2774007
 
3.2%
6 2722345
 
3.1%
9 2706267
 
3.1%
7 2398301
 
2.8%
8 2326732
 
2.7%
3 2018337
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 86792980
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 20536993
23.7%
2 18261521
21.0%
. 17358596
20.0%
1 14100046
16.2%
5 2774007
 
3.2%
6 2722345
 
3.1%
9 2706267
 
3.1%
7 2398301
 
2.8%
8 2326732
 
2.7%
3 2018337
 
2.3%

EntryTime
Categorical

Distinct27227
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size66.2 MiB
30.12.1899 6:00:00
 
505282
30.12.1899 7:00:00
 
169117
30.12.1899 6:30:00
 
146460
30.12.1899 7:30:00
 
105216
30.12.1899 8:00:00
 
80819
Other values (27222)
7672404 

Length

Max length19
Median length18
Mean length18.126613
Min length18

Characters and Unicode

Total characters157326278
Distinct characters13
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12782 ?
Unique (%)0.1%

Sample

1st row30.12.1899 7:42:00
2nd row30.12.1899 7:42:00
3rd row30.12.1899 7:31:00
4th row30.12.1899 7:31:00
5th row30.12.1899 7:31:00

Common Values

ValueCountFrequency (%)
30.12.1899 6:00:00 505282
 
5.8%
30.12.1899 7:00:00 169117
 
1.9%
30.12.1899 6:30:00 146460
 
1.7%
30.12.1899 7:30:00 105216
 
1.2%
30.12.1899 8:00:00 80819
 
0.9%
30.12.1899 10:00:00 80726
 
0.9%
30.12.1899 7:15:00 77290
 
0.9%
30.12.1899 6:36:00 75669
 
0.9%
30.12.1899 7:20:00 75477
 
0.9%
30.12.1899 6:37:00 75223
 
0.9%
Other values (27217) 7288019
84.0%

Length

2022-11-26T19:48:04.635077image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
30.12.1899 8679298
50.0%
6:00:00 505282
 
2.9%
7:00:00 169117
 
1.0%
6:30:00 146460
 
0.8%
7:30:00 105216
 
0.6%
8:00:00 80819
 
0.5%
10:00:00 80726
 
0.5%
7:15:00 77290
 
0.4%
6:36:00 75669
 
0.4%
7:20:00 75477
 
0.4%
Other values (27218) 7363242
42.4%

Most occurring characters

ValueCountFrequency (%)
0 30232683
19.2%
1 20535435
13.1%
9 18477092
11.7%
. 17358596
11.0%
: 17358596
11.0%
3 11384724
 
7.2%
2 10814043
 
6.9%
8 10284590
 
6.5%
8679298
 
5.5%
7 4398647
 
2.8%
Other values (3) 7802574
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 113929788
72.4%
Other Punctuation 34717192
 
22.1%
Space Separator 8679298
 
5.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 30232683
26.5%
1 20535435
18.0%
9 18477092
16.2%
3 11384724
 
10.0%
2 10814043
 
9.5%
8 10284590
 
9.0%
7 4398647
 
3.9%
6 3472949
 
3.0%
4 2204150
 
1.9%
5 2125475
 
1.9%
Other Punctuation
ValueCountFrequency (%)
. 17358596
50.0%
: 17358596
50.0%
Space Separator
ValueCountFrequency (%)
8679298
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 157326278
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 30232683
19.2%
1 20535435
13.1%
9 18477092
11.7%
. 17358596
11.0%
: 17358596
11.0%
3 11384724
 
7.2%
2 10814043
 
6.9%
8 10284590
 
6.5%
8679298
 
5.5%
7 4398647
 
2.8%
Other values (3) 7802574
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 157326278
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 30232683
19.2%
1 20535435
13.1%
9 18477092
11.7%
. 17358596
11.0%
: 17358596
11.0%
3 11384724
 
7.2%
2 10814043
 
6.9%
8 10284590
 
6.5%
8679298
 
5.5%
7 4398647
 
2.8%
Other values (3) 7802574
 
5.0%

Code
Categorical

HIGH CARDINALITY
MISSING

Distinct2025
Distinct (%)0.4%
Missing8106112
Missing (%)93.4%
Memory size66.2 MiB
41366
77143 
20045
55042 
20044
49730 
40669
49714 
40668
49712 
Other values (2020)
291845 

Length

Max length34
Median length5
Mean length6.6410031
Min length3

Characters and Unicode

Total characters3806530
Distinct characters52
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique498 ?
Unique (%)0.1%

Sample

1st rowSZU-HEP_0004__R
2nd rowZU-OKM_0024__R
3rd row20411
4th row20042
5th row20411

Common Values

ValueCountFrequency (%)
41366 77143
 
0.9%
20045 55042
 
0.6%
20044 49730
 
0.6%
40669 49714
 
0.6%
40668 49712
 
0.6%
20411 28244
 
0.3%
20042 27766
 
0.3%
8094 22631
 
0.3%
9520 12429
 
0.1%
9524 12429
 
0.1%
Other values (2015) 188346
 
2.2%
(Missing) 8106112
93.4%

Length

2022-11-26T19:48:04.711681image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
41366 77143
 
13.3%
20045 55042
 
9.5%
20044 49730
 
8.6%
40669 49714
 
8.6%
40668 49712
 
8.6%
20411 28244
 
4.9%
20042 27766
 
4.8%
8094 22631
 
3.9%
9520 12429
 
2.2%
9524 12429
 
2.2%
Other values (2044) 193136
33.4%

Most occurring characters

ValueCountFrequency (%)
0 700804
18.4%
4 480708
12.6%
6 378511
9.9%
2 337420
8.9%
_ 264513
 
6.9%
1 247718
 
6.5%
M 174898
 
4.6%
9 162389
 
4.3%
5 143039
 
3.8%
3 139469
 
3.7%
Other values (42) 777061
20.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2717681
71.4%
Uppercase Letter 718544
 
18.9%
Connector Punctuation 264513
 
6.9%
Dash Punctuation 87989
 
2.3%
Lowercase Letter 10035
 
0.3%
Space Separator 4790
 
0.1%
Other Punctuation 2978
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M 174898
24.3%
L 89636
12.5%
I 88036
12.3%
E 87777
12.2%
P 87758
12.2%
K 87703
12.2%
R 60557
 
8.4%
S 27132
 
3.8%
C 3728
 
0.5%
H 3227
 
0.4%
Other values (15) 8092
 
1.1%
Lowercase Letter
ValueCountFrequency (%)
a 2242
22.3%
e 1916
19.1%
s 958
9.5%
r 958
9.5%
i 958
9.5%
o 958
9.5%
g 958
9.5%
h 958
9.5%
w 103
 
1.0%
c 23
 
0.2%
Decimal Number
ValueCountFrequency (%)
0 700804
25.8%
4 480708
17.7%
6 378511
13.9%
2 337420
12.4%
1 247718
 
9.1%
9 162389
 
6.0%
5 143039
 
5.3%
3 139469
 
5.1%
8 103770
 
3.8%
7 23853
 
0.9%
Other Punctuation
ValueCountFrequency (%)
: 1312
44.1%
* 1312
44.1%
, 354
 
11.9%
Connector Punctuation
ValueCountFrequency (%)
_ 264513
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 87989
100.0%
Space Separator
ValueCountFrequency (%)
4790
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3077951
80.9%
Latin 728579
 
19.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
M 174898
24.0%
L 89636
12.3%
I 88036
12.1%
E 87777
12.0%
P 87758
12.0%
K 87703
12.0%
R 60557
 
8.3%
S 27132
 
3.7%
C 3728
 
0.5%
H 3227
 
0.4%
Other values (26) 18127
 
2.5%
Common
ValueCountFrequency (%)
0 700804
22.8%
4 480708
15.6%
6 378511
12.3%
2 337420
11.0%
_ 264513
 
8.6%
1 247718
 
8.0%
9 162389
 
5.3%
5 143039
 
4.6%
3 139469
 
4.5%
8 103770
 
3.4%
Other values (6) 119610
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3806530
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 700804
18.4%
4 480708
12.6%
6 378511
9.9%
2 337420
8.9%
_ 264513
 
6.9%
1 247718
 
6.5%
M 174898
 
4.6%
9 162389
 
4.3%
5 143039
 
3.8%
3 139469
 
3.7%
Other values (42) 777061
20.4%

NCLP
Real number (ℝ)

Distinct1619
Distinct (%)< 0.1%
Missing573524
Missing (%)6.6%
Infinite0
Infinite (%)0.0%
Mean7552.2215
Minimum53
Maximum53191
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size66.2 MiB
2022-11-26T19:48:04.796663image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum53
5-th percentile921
Q13078
median4769
Q312460
95-th percentile17339
Maximum53191
Range53138
Interquartile range (IQR)9382

Descriptive statistics

Standard deviation6188.5202
Coefficient of variation (CV)0.81943044
Kurtosis5.7807276
Mean7552.2215
Median Absolute Deviation (MAD)3419
Skewness1.4040591
Sum6.1216601 × 1010
Variance38297782
MonotonicityNot monotonic
2022-11-26T19:48:04.886932image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8574 144122
 
1.7%
1675 134565
 
1.6%
2688 134542
 
1.6%
1991 134540
 
1.6%
13808 134536
 
1.6%
2099 134488
 
1.5%
2419 134484
 
1.5%
4726 134482
 
1.5%
4769 134480
 
1.5%
16263 134308
 
1.5%
Other values (1609) 6751227
77.8%
(Missing) 573524
 
6.6%
ValueCountFrequency (%)
53 147
 
< 0.1%
54 145
 
< 0.1%
80 1546
< 0.1%
82 871
< 0.1%
88 1
 
< 0.1%
116 1546
< 0.1%
118 871
< 0.1%
124 1
 
< 0.1%
149 1
 
< 0.1%
159 908
< 0.1%
ValueCountFrequency (%)
53191 21
 
< 0.1%
52685 1409
 
< 0.1%
52684 151
 
< 0.1%
52683 1
 
< 0.1%
52681 7807
0.1%
52680 1671
 
< 0.1%
52674 7
 
< 0.1%
52673 32
 
< 0.1%
52672 3632
< 0.1%
52671 267
 
< 0.1%

Analyte
Categorical

Distinct3479
Distinct (%)< 0.1%
Missing1923
Missing (%)< 0.1%
Memory size66.2 MiB
s_kreatinin
 
144100
B_Hemoglobin
 
134480
B_Erytrocyty
 
134479
B_Hematokrit
 
134479
B_Trombocyty
 
134479
Other values (3474)
7995358 

Length

Max length67
Median length51
Mean length10.656878
Min length1

Characters and Unicode

Total characters92473723
Distinct characters110
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique920 ?
Unique (%)< 0.1%

Sample

1st rowHBsAg konfirmace
2nd rowRotaviry + Noroviry stolice
3rd rows_vápník celk.
4th rows_fosfor
5th rows_hořčík

Common Values

ValueCountFrequency (%)
s_kreatinin 144100
 
1.7%
B_Hemoglobin 134480
 
1.5%
B_Erytrocyty 134479
 
1.5%
B_Hematokrit 134479
 
1.5%
B_Trombocyty 134479
 
1.5%
B_Leukocyty 134479
 
1.5%
B_MCV 134477
 
1.5%
B_MCH 134477
 
1.5%
B_MCHC 134477
 
1.5%
B_RDW 134476
 
1.5%
Other values (3469) 7322972
84.4%

Length

2022-11-26T19:48:04.990365image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
cholest 311720
 
2.9%
celk 272070
 
2.5%
b_thr 270278
 
2.5%
uf 249467
 
2.3%
u_epitelie 175452
 
1.6%
u_leukocyty 151861
 
1.4%
u_válce 151817
 
1.4%
s_kreatinin 144132
 
1.3%
objem 139587
 
1.3%
b_leukocyty 136424
 
1.3%
Other values (3063) 8889437
81.6%

Most occurring characters

ValueCountFrequency (%)
_ 8945969
 
9.7%
o 5253273
 
5.7%
s 4641648
 
5.0%
e 4550671
 
4.9%
t 4061586
 
4.4%
l 4008082
 
4.3%
r 3925049
 
4.2%
i 3914180
 
4.2%
y 3519438
 
3.8%
a 3510497
 
3.8%
Other values (100) 46143330
49.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 59839356
64.7%
Uppercase Letter 18266690
 
19.8%
Connector Punctuation 8945969
 
9.7%
Space Separator 2233708
 
2.4%
Other Punctuation 1487831
 
1.6%
Decimal Number 803795
 
0.9%
Dash Punctuation 521520
 
0.6%
Math Symbol 142496
 
0.2%
Open Punctuation 113902
 
0.1%
Close Punctuation 113887
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 5253273
 
8.8%
s 4641648
 
7.8%
e 4550671
 
7.6%
t 4061586
 
6.8%
l 4008082
 
6.7%
r 3925049
 
6.6%
i 3914180
 
6.5%
y 3519438
 
5.9%
a 3510497
 
5.9%
n 3279090
 
5.5%
Other values (33) 19175842
32.0%
Uppercase Letter
ValueCountFrequency (%)
B 3229915
17.7%
U 1947634
10.7%
T 1914675
10.5%
H 1240246
 
6.8%
L 1115814
 
6.1%
C 1113063
 
6.1%
D 1038222
 
5.7%
A 960739
 
5.3%
P 837597
 
4.6%
M 706294
 
3.9%
Other values (23) 4162491
22.8%
Other Punctuation
ValueCountFrequency (%)
. 1384974
93.1%
% 62467
 
4.2%
/ 34598
 
2.3%
, 3441
 
0.2%
: 1447
 
0.1%
; 579
 
< 0.1%
* 254
 
< 0.1%
# 66
 
< 0.1%
? 3
 
< 0.1%
! 2
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 377494
47.0%
2 214601
26.7%
3 82471
 
10.3%
4 58623
 
7.3%
9 27872
 
3.5%
5 14211
 
1.8%
0 10738
 
1.3%
8 7350
 
0.9%
6 6765
 
0.8%
7 3670
 
0.5%
Math Symbol
ValueCountFrequency (%)
> 134381
94.3%
+ 8111
 
5.7%
= 4
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2233702
> 99.9%
  6
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 521512
> 99.9%
8
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 113016
99.2%
[ 886
 
0.8%
Close Punctuation
ValueCountFrequency (%)
) 113007
99.2%
] 880
 
0.8%
Other Symbol
ValueCountFrequency (%)
4557
99.7%
° 12
 
0.3%
Connector Punctuation
ValueCountFrequency (%)
_ 8945969
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 78106046
84.5%
Common 14367677
 
15.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 5253273
 
6.7%
s 4641648
 
5.9%
e 4550671
 
5.8%
t 4061586
 
5.2%
l 4008082
 
5.1%
r 3925049
 
5.0%
i 3914180
 
5.0%
y 3519438
 
4.5%
a 3510497
 
4.5%
n 3279090
 
4.2%
Other values (66) 37442532
47.9%
Common
ValueCountFrequency (%)
_ 8945969
62.3%
2233702
 
15.5%
. 1384974
 
9.6%
- 521512
 
3.6%
1 377494
 
2.6%
2 214601
 
1.5%
> 134381
 
0.9%
( 113016
 
0.8%
) 113007
 
0.8%
3 82471
 
0.6%
Other values (24) 246550
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90120659
97.5%
None 2348499
 
2.5%
Letterlike Symbols 4557
 
< 0.1%
Punctuation 8
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 8945969
 
9.9%
o 5253273
 
5.8%
s 4641648
 
5.2%
e 4550671
 
5.0%
t 4061586
 
4.5%
l 4008082
 
4.4%
r 3925049
 
4.4%
i 3914180
 
4.3%
y 3519438
 
3.9%
a 3510497
 
3.9%
Other values (72) 43790266
48.6%
None
ValueCountFrequency (%)
í 659737
28.1%
á 573573
24.4%
č 279755
11.9%
é 262785
 
11.2%
ó 251870
 
10.7%
ř 104270
 
4.4%
ž 94351
 
4.0%
ý 51167
 
2.2%
š 30613
 
1.3%
ů 26985
 
1.1%
Other values (16) 13393
 
0.6%
Letterlike Symbols
ValueCountFrequency (%)
4557
100.0%
Punctuation
ValueCountFrequency (%)
8
100.0%

ValueNumber
Categorical

HIGH CARDINALITY
MISSING

Distinct23084
Distinct (%)0.3%
Missing1338881
Missing (%)15.4%
Memory size66.2 MiB
0
 
435532
1
 
68671
2
 
50246
5
 
44935
5.5
 
41213
Other values (23079)
6699820 

Length

Max length8
Median length7
Mean length3.3814182
Min length1

Characters and Unicode

Total characters24821020
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8522 ?
Unique (%)0.1%

Sample

1st row2.37
2nd row1.32
3rd row0.75
4th row138.9
5th row10.36

Common Values

ValueCountFrequency (%)
0 435532
 
5.0%
1 68671
 
0.8%
2 50246
 
0.6%
5 44935
 
0.5%
5.5 41213
 
0.5%
3 40192
 
0.5%
0.3 39812
 
0.5%
0.4 38905
 
0.4%
0.02 36362
 
0.4%
0.2 35349
 
0.4%
Other values (23074) 6509200
75.0%
(Missing) 1338881
 
15.4%

Length

2022-11-26T19:48:05.081502image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0 435532
 
5.9%
1 68935
 
0.9%
2 50451
 
0.7%
5 45024
 
0.6%
5.5 41304
 
0.6%
3 40390
 
0.5%
0.3 40100
 
0.5%
0.4 39206
 
0.5%
0.02 36362
 
0.5%
0.2 35645
 
0.5%
Other values (22863) 6519674
88.7%

Most occurring characters

ValueCountFrequency (%)
. 5542547
22.3%
1 3388325
13.7%
0 2682178
10.8%
3 2290612
9.2%
2 2112167
 
8.5%
4 2019656
 
8.1%
5 1609675
 
6.5%
6 1345696
 
5.4%
7 1277717
 
5.1%
8 1268324
 
5.1%
Other values (3) 1284123
 
5.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 19254061
77.6%
Other Punctuation 5542547
 
22.3%
Space Separator 12206
 
< 0.1%
Dash Punctuation 12206
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 3388325
17.6%
0 2682178
13.9%
3 2290612
11.9%
2 2112167
11.0%
4 2019656
10.5%
5 1609675
8.4%
6 1345696
 
7.0%
7 1277717
 
6.6%
8 1268324
 
6.6%
9 1259711
 
6.5%
Other Punctuation
ValueCountFrequency (%)
. 5542547
100.0%
Space Separator
ValueCountFrequency (%)
12206
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 12206
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 24821020
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 5542547
22.3%
1 3388325
13.7%
0 2682178
10.8%
3 2290612
9.2%
2 2112167
 
8.5%
4 2019656
 
8.1%
5 1609675
 
6.5%
6 1345696
 
5.4%
7 1277717
 
5.1%
8 1268324
 
5.1%
Other values (3) 1284123
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24821020
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 5542547
22.3%
1 3388325
13.7%
0 2682178
10.8%
3 2290612
9.2%
2 2112167
 
8.5%
4 2019656
 
8.1%
5 1609675
 
6.5%
6 1345696
 
5.4%
7 1277717
 
5.1%
8 1268324
 
5.1%
Other values (3) 1284123
 
5.2%

ValueText
Categorical

HIGH CARDINALITY
MISSING

Distinct32847
Distinct (%)2.5%
Missing7340434
Missing (%)84.6%
Memory size66.2 MiB
negativní
485411 
normalní nález
138384 
normal
105600 
ojediněle
65120 
normální nález
60485 
Other values (32842)
483864 

Length

Max length1920
Median length1807
Mean length17.411001
Min length1

Characters and Unicode

Total characters23310962
Distinct characters134
Distinct categories13 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique26196 ?
Unique (%)2.0%

Sample

1st rowexterně
2nd rowodběr
3rd rownormal
4th rownegativní
5th rownegativní

Common Values

ValueCountFrequency (%)
negativní 485411
 
5.6%
normalní nález 138384
 
1.6%
normal 105600
 
1.2%
ojediněle 65120
 
0.8%
normální nález 60485
 
0.7%
ordinace 50572
 
0.6%
nelze hodnotit 37990
 
0.4%
odběr 27114
 
0.3%
Není známo, zda pacient užívá Warfarin (kumariny) 20592
 
0.2%
POZOR!!! Změna principu stanovení močového sedimentu (průtoková cytometrie). 18463
 
0.2%
Other values (32837) 329133
 
3.8%
(Missing) 7340434
84.6%

Length

2022-11-26T19:48:05.180126image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
negativní 488157
 
16.4%
nález 199216
 
6.7%
normalní 138384
 
4.7%
normal 105650
 
3.6%
pacient 71962
 
2.4%
ojediněle 65372
 
2.2%
63172
 
2.1%
užívá 62016
 
2.1%
normální 60644
 
2.0%
ordinace 50576
 
1.7%
Other values (32105) 1669227
56.1%

Most occurring characters

ValueCountFrequency (%)
n 2792801
 
12.0%
1724095
 
7.4%
e 1683160
 
7.2%
a 1485544
 
6.4%
o 1331305
 
5.7%
i 1186964
 
5.1%
t 1032153
 
4.4%
í 932939
 
4.0%
r 895491
 
3.8%
v 850125
 
3.6%
Other values (124) 9396385
40.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 18388306
78.9%
Space Separator 1724095
 
7.4%
Uppercase Letter 1175047
 
5.0%
Decimal Number 706778
 
3.0%
Other Punctuation 597388
 
2.6%
Control 225098
 
1.0%
Dash Punctuation 201138
 
0.9%
Math Symbol 123628
 
0.5%
Open Punctuation 77318
 
0.3%
Close Punctuation 77257
 
0.3%
Other values (3) 14909
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 2792801
15.2%
e 1683160
 
9.2%
a 1485544
 
8.1%
o 1331305
 
7.2%
i 1186964
 
6.5%
t 1032153
 
5.6%
í 932939
 
5.1%
r 895491
 
4.9%
v 850125
 
4.6%
l 809443
 
4.4%
Other values (35) 5388381
29.3%
Uppercase Letter
ValueCountFrequency (%)
P 116360
 
9.9%
O 101828
 
8.7%
N 91753
 
7.8%
A 63780
 
5.4%
R 63176
 
5.4%
E 55901
 
4.8%
L 54974
 
4.7%
M 54278
 
4.6%
T 53950
 
4.6%
Z 53932
 
4.6%
Other values (34) 465115
39.6%
Other Punctuation
ValueCountFrequency (%)
. 269783
45.2%
, 111861
18.7%
: 110138
18.4%
! 72319
 
12.1%
/ 19252
 
3.2%
* 4323
 
0.7%
% 2824
 
0.5%
' 2313
 
0.4%
@ 2071
 
0.3%
" 1171
 
0.2%
Other values (4) 1333
 
0.2%
Decimal Number
ValueCountFrequency (%)
0 185691
26.3%
1 136417
19.3%
2 126011
17.8%
3 53289
 
7.5%
5 49627
 
7.0%
4 45369
 
6.4%
8 31105
 
4.4%
6 29922
 
4.2%
9 25107
 
3.6%
7 24240
 
3.4%
Math Symbol
ValueCountFrequency (%)
+ 79362
64.2%
< 37102
30.0%
> 4171
 
3.4%
= 2370
 
1.9%
| 623
 
0.5%
Control
ValueCountFrequency (%)
150058
66.7%
75038
33.3%
2
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 77293
> 99.9%
21
 
< 0.1%
[ 4
 
< 0.1%
Modifier Symbol
ValueCountFrequency (%)
´ 61
57.0%
` 42
39.3%
¨ 4
 
3.7%
Dash Punctuation
ValueCountFrequency (%)
- 201137
> 99.9%
1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 77253
> 99.9%
] 4
 
< 0.1%
Space Separator
ValueCountFrequency (%)
1724095
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 14717
100.0%
Other Symbol
ValueCountFrequency (%)
° 85
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 19563176
83.9%
Common 3747786
 
16.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 2792801
14.3%
e 1683160
 
8.6%
a 1485544
 
7.6%
o 1331305
 
6.8%
i 1186964
 
6.1%
t 1032153
 
5.3%
í 932939
 
4.8%
r 895491
 
4.6%
v 850125
 
4.3%
l 809443
 
4.1%
Other values (78) 6563251
33.5%
Common
ValueCountFrequency (%)
1724095
46.0%
. 269783
 
7.2%
- 201137
 
5.4%
0 185691
 
5.0%
150058
 
4.0%
1 136417
 
3.6%
2 126011
 
3.4%
, 111861
 
3.0%
: 110138
 
2.9%
+ 79362
 
2.1%
Other values (36) 653233
 
17.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21227794
91.1%
None 2083146
 
8.9%
Punctuation 22
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 2792801
 
13.2%
1724095
 
8.1%
e 1683160
 
7.9%
a 1485544
 
7.0%
o 1331305
 
6.3%
i 1186964
 
5.6%
t 1032153
 
4.9%
r 895491
 
4.2%
v 850125
 
4.0%
l 809443
 
3.8%
Other values (82) 7436713
35.0%
None
ValueCountFrequency (%)
í 932939
44.8%
á 517862
24.9%
ě 177918
 
8.5%
ž 105876
 
5.1%
é 79957
 
3.8%
ř 56075
 
2.7%
ý 51172
 
2.5%
č 50696
 
2.4%
ů 36807
 
1.8%
š 21368
 
1.0%
Other values (30) 52476
 
2.5%
Punctuation
ValueCountFrequency (%)
21
95.5%
1
 
4.5%

RefHigh
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing1747017
Missing (%)20.1%
Memory size66.2 MiB

RefLow
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing1747253
Missing (%)20.1%
Memory size66.2 MiB

Unit
Categorical

HIGH CARDINALITY
MISSING

Distinct195
Distinct (%)< 0.1%
Missing1383102
Missing (%)15.9%
Memory size66.2 MiB
mmol/l
1478479 
%
868045 
x10^9/l
403422 
fl
401148 
g/l
397441 
Other values (190)
3747661 

Length

Max length22
Median length15
Mean length4.6795038
Min length1

Characters and Unicode

Total characters34142577
Distinct characters70
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19 ?
Unique (%)< 0.1%

Sample

1st rowmmol/l
2nd rowmmol/l
3rd rowmmol/l
4th rowumol/l
5th rowmmol/l

Common Values

ValueCountFrequency (%)
mmol/l 1478479
17.0%
% 868045
 
10.0%
x10^9/l 403422
 
4.6%
fl 401148
 
4.6%
g/l 397441
 
4.6%
ukat/l 343569
 
4.0%
10^9/l 322382
 
3.7%
umol/l 246669
 
2.8%
µkat/l 236565
 
2.7%
/uL 224902
 
2.6%
Other values (185) 2373574
27.3%
(Missing) 1383102
15.9%

Length

2022-11-26T19:48:05.267652image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mmol/l 1501468
20.1%
974603
13.1%
x10^9/l 403422
 
5.4%
fl 401203
 
5.4%
g/l 397527
 
5.3%
ukat/l 343753
 
4.6%
10^9/l 322389
 
4.3%
ul 269842
 
3.6%
umol/l 246762
 
3.3%
µkat/l 236597
 
3.2%
Other values (159) 2355249
31.6%

Most occurring characters

ValueCountFrequency (%)
l 7578497
22.2%
/ 5429095
15.9%
m 4615170
13.5%
o 2399525
 
7.0%
1 1214359
 
3.6%
0 922814
 
2.7%
u 891088
 
2.6%
^ 872267
 
2.6%
% 871027
 
2.6%
g 833474
 
2.4%
Other values (60) 8515261
24.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 21727754
63.6%
Other Punctuation 6795749
 
19.9%
Decimal Number 3588693
 
10.5%
Uppercase Letter 895158
 
2.6%
Modifier Symbol 872267
 
2.6%
Space Separator 156619
 
0.5%
Dash Punctuation 106315
 
0.3%
Math Symbol 14
 
< 0.1%
Connector Punctuation 8
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 7578497
34.9%
m 4615170
21.2%
o 2399525
 
11.0%
u 891088
 
4.1%
g 833474
 
3.8%
k 794400
 
3.7%
a 748953
 
3.4%
t 614977
 
2.8%
µ 608736
 
2.8%
x 537592
 
2.5%
Other values (19) 2105342
 
9.7%
Uppercase Letter
ValueCountFrequency (%)
L 426693
47.7%
I 152789
 
17.1%
U 143961
 
16.1%
P 60302
 
6.7%
R 44442
 
5.0%
N 41656
 
4.7%
E 5741
 
0.6%
Y 5270
 
0.6%
A 4148
 
0.5%
F 3327
 
0.4%
Other values (9) 6829
 
0.8%
Decimal Number
ValueCountFrequency (%)
1 1214359
33.8%
0 922814
25.7%
9 775573
21.6%
2 287393
 
8.0%
3 223788
 
6.2%
7 147876
 
4.1%
6 12125
 
0.3%
4 4747
 
0.1%
5 18
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 5429095
79.9%
% 871027
 
12.8%
. 347724
 
5.1%
, 147872
 
2.2%
* 29
 
< 0.1%
" 2
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
< 6
42.9%
> 6
42.9%
+ 2
 
14.3%
Modifier Symbol
ValueCountFrequency (%)
^ 872267
100.0%
Space Separator
ValueCountFrequency (%)
156619
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 106315
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 22014176
64.5%
Common 12128401
35.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 7578497
34.4%
m 4615170
21.0%
o 2399525
 
10.9%
u 891088
 
4.0%
g 833474
 
3.8%
k 794400
 
3.6%
a 748953
 
3.4%
t 614977
 
2.8%
x 537592
 
2.4%
L 426693
 
1.9%
Other values (37) 2573807
 
11.7%
Common
ValueCountFrequency (%)
/ 5429095
44.8%
1 1214359
 
10.0%
0 922814
 
7.6%
^ 872267
 
7.2%
% 871027
 
7.2%
9 775573
 
6.4%
µ 608736
 
5.0%
. 347724
 
2.9%
2 287393
 
2.4%
3 223788
 
1.8%
Other values (13) 575625
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 33488291
98.1%
None 654286
 
1.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l 7578497
22.6%
/ 5429095
16.2%
m 4615170
13.8%
o 2399525
 
7.2%
1 1214359
 
3.6%
0 922814
 
2.8%
u 891088
 
2.7%
^ 872267
 
2.6%
% 871027
 
2.6%
g 833474
 
2.5%
Other values (55) 7860975
23.5%
None
ValueCountFrequency (%)
µ 608736
93.0%
č 43054
 
6.6%
í 2382
 
0.4%
Č 107
 
< 0.1%
ě 7
 
< 0.1%

NLCP_E
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size66.2 MiB

NCLP_E
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size66.2 MiB

Interactions

2022-11-26T19:46:57.787452image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:50.945096image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:53.339523image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:55.554081image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:58.384417image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:51.525980image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:53.787334image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:56.152889image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:58.978430image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:52.105795image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:54.345074image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:56.623628image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:59.485410image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:52.718396image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:54.926645image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-26T19:46:57.183220image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Correlations

2022-11-26T19:48:05.326139image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-11-26T19:48:05.404260image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-26T19:48:05.482912image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-26T19:48:05.558946image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-26T19:48:05.637318image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-26T19:47:08.098035image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-26T19:47:22.904528image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-26T19:47:55.462154image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

PatientReportIDEntryDateEntryTimeCodeNCLPAnalyteValueNumberValueTextRefHighRefLowUnitNLCP_ENCLP_E
03247292720017115421471230.09.202230.12.1899 7:42:00SZU-HEP_0004__RNaNHBsAg konfirmaceNaNexterněNaNNaNNaN
13247292720017115421472030.09.202230.12.1899 7:42:00ZU-OKM_0024__RNaNRotaviry + Noroviry stoliceNaNodběrNaNNaNNaN
232472914531551824435516.04.201530.12.1899 7:31:00NaN3482.0s_vápník celk.2.37NaN2.552.15mmol/l52672.052672.0
332472914531551824435616.04.201530.12.1899 7:31:00NaN2618.0s_fosfor1.32NaN1.230.71mmol/l2618.02618.0
432472914531551824435716.04.201530.12.1899 7:31:00NaN3940.0s_hořčík0.75NaN0.940.71mmol/l3940.03940.0
532472914531551824435816.04.201530.12.1899 7:31:00NaN8574.0s_kreatinin138.9NaN104.064.0umol/l8574.08574.0
632472914531551824435916.04.201530.12.1899 7:31:00NaN1896.0p_glukóza (NaF)10.36NaN5.593.60mmol/l51802.051802.0
732472914531551824436016.04.201530.12.1899 7:31:00NaN1350.0s_cholesterol3.5NaN5.02.9mmol/l1350.01350.0
832472914531551824436116.04.201530.12.1899 7:31:00NaN12374.0s_TAG1.77NaN1.690.50mmol/l12374.012374.0
932472914531551824436216.04.201530.12.1899 7:31:00NaN2036.0s_HDL cholest.0.93NaN2.11.00mmol/l2036.02036.0
PatientReportIDEntryDateEntryTimeCodeNCLPAnalyteValueNumberValueTextRefHighRefLowUnitNLCP_ENCLP_E
86792883354722649046214094284213.05.202230.12.1899 6:47:00NaN9189.0U_válce hyalinní UF3NaN2.00/µL9189.09189.0
86792893354722649046214094284313.05.202230.12.1899 6:47:00NaN15160.0U_epitelie renální UFNaNnormální nálezNaNNaNNaN15160.015160.0
86792903354722649046214094284413.05.202230.12.1899 6:47:00NaN14013.0U_epitelie přechodné UF0NaN0.00/µL14013.014013.0
86792913354722649046214094284513.05.202230.12.1899 6:47:00NaN14011.0U_epitelie dlaž.UF1NaN20.00/µL14011.014011.0
86792923354722649046214094284613.05.202230.12.1899 6:47:00NaN3414.0U_protein celk.NaNnegativníNaNNaNNaN3414.03414.0
86792933354722649046214094284713.05.202230.12.1899 6:47:00NaN3386.0U_shluky leukocytů UF0NaNNaNNaNarb.j.3386.03386.0
86792943354722649046214094284813.05.202230.12.1899 6:47:00NaN3280.0U_bilirubinNaNnegativníNaNNaNNaN3280.03280.0
86792953354722649046214094284913.05.202230.12.1899 6:47:00NaN9322.0U_leukocyty UF5NaN10.00/µL9322.09322.0
86792963354722649046214094285013.05.202230.12.1899 6:47:00NaN3078.0s_kys.močová327NaN420.0210µmol/l3078.03078.0
86792973354722740926115844795329.09.202230.12.1899 6:44:00NaN15194.0B_HbA1c53NaN42.020.0mmol/mol15194.015194.0